Who’s who?
What’s the D-Lab?
FUNDAMENTALS: Some of you may be brand spankin’ new to R If that’s the case, today might be a whirlwind for you, but please ask questions! (No shame.) I will do my best to explain things as I go, while also keeping time constraints in mind Also, here’s a good starting R tutorial from our Stats Dept: http://www.stat.berkeley.edu/~spector/R.pdf
Geographic: expressed as angles in two dimensions that determine points on a globe, usually using latitude and longitude; a.k.a. unprojected coordinates
Cartesian: expressed as points on a Cartesian plane (i.e. x and y axes), after the globe’s surface has been projected onto the plane; a.k.a. projected coordinates (more on this in a bit)
Note that, depending where you read, there are others, but the distinction between these two types is fundamentally important.
which serves as a mathematical model (approximation) of the true, lumpy geoid that is our planet.
Because strict spheroids and ellipsoids do not allow for important local variations in elevation, they are used as the bases for the the defintion of various geodetic datums, which then serve as the mathematical models that provide the reference points against which coordinates can be expressed. Thus, the spheroid/ellipsoid and geodetic datum two of the main components we must consider when working the coordinate reference systems (CRS) of our datasets.
We then use the CRS as the basis against which any point on the earth’s surface can be positioned.
Here is an example of a common type of CRS, called Universal Transverse Mercator (UTM). Some of the data we’ll work with will be in UTM.
A projection is a scheme for representing the globe’s surface on a flat plane.
Literally imagine placing the global inside a cylinder (a cylindrical projection) or a cone (conical projection), for example, then putting a light inside it (jack-o-lantern-style), and capturing the image on the cylinder, cone, what-have-you, and then cutting it and unfolfing it flat.
As you can probably imagine, different datasets in different projections don’t jive all that well.
Also, it is important to be aware: It is shown impossible to devise a projection that maintains the true areas, sizes, angular relations, and shapes of geographic entities (i.e. ‘something’s gotta give’). So when choosing projections we often want to consider which of these are most important to preserve for our purposes (i.e. do we want an equal-area projection? an equal-angle projection? etc.)
This spawns some pretty passionate opinions about map projections:
Vector: points, lines, or polygons (or sets of these) can be expressed as connected series of points; the number of points used per real-world distance will determine the resolution of this representation (and thus, the level of detail, or resolution with which we are representing the real-world complexity of a geographic feature (consider, for example, that the most accurate map of the CA coast would be a 1:1 scaled replica of the coast, i.e. the coast itself…).
####Basically, think of a ‘connect-the-dots’ model.
Raster: continuously spatially distributed variables are often represented by gridded cells, each having a location (which can be defined as the coordinates of its center, or of its lower-left corner, etcetera) and at least one value (for the variable(s) of interest). Cells will have a fixed cell-size (typically expressed as the real-world distance represented by a cell-side, in either distance or degrees), and this cell-size will determine the resolution (i.e. level of detail) with which we represent the real-world complexity of this variable.
####Basically, think of a ‘color-by-number’ model.
(Indeed, digital photos are commonly saved as rasters, usually with each cell having 3 values, for red, green, and blue.)
In some use-cases, one type simply makes a lot more sense than the other (e.g. temperature is usually a raster, jursdictional boundaries are usually vectors).
In other cases, it may actually depend on the scale of the analysis (e.g. it might make sense to think of a road as vector at the scale of a city, but as a raster at the scale of a neighborhood).
Here is a smattering of examples of common spatial analysis methods. By no means comprehensive, but to give you a taste. (And this includes some operations that we will use in our workshop today.)
This should all make more sense as we start to play with actual data.
Most of the data we will be working with is vector data, but we’ll also briefly see some raster data at the end.
Here goes!…
Let’s start off with some good boilerplate code for getting everyone’s workspace, working directory, and packages set up (Gleaned from Shinhye Choi’s and Patty Frontiera’s spatial-R workshop, which you should register for if you’d be interested in learning more after today’s material! Check http://gif.berkeley.edu/support/workshops.html for scheduling of this and other workshops.)
NOTE: Be sure to replace my working directory with the path to yours!
#clear workspace
rm(list = ls())
#set your working directory
#setwd("/home/ihavehands/Hidden_Desktop/berk/my_workshops/r_spatial/") #NOTE: REPLACE WITH YOUR LOCAL DIRECTORY
#check for packages, install those you don't already have
required.pkg <- c('utils', 'raster', 'sp', 'maptools', 'readxl', 'dismo', 'rgeos') #NOTE: LIST ALL PACKAGES HERE
#required.pkg <- c('utils', 'raster', 'rgdal', 'sp', 'maptools', 'readxl', 'dismo', 'rgeos') #NOTE: LIST ALL PACKAGES HERE
pkgs.not.installed <- required.pkg[!sapply(required.pkg, function(p) require(p, character.only=T))]
## Loading required package: raster
## Loading required package: sp
## Loading required package: maptools
## Checking rgeos availability: TRUE
## Loading required package: readxl
## Loading required package: dismo
## Loading required package: rgeos
## rgeos version: 0.3-22, (SVN revision 544)
## GEOS runtime version: 3.5.0-CAPI-1.9.0 r4084
## Linking to sp version: 1.2-4
## Polygon checking: TRUE
install.packages(pkgs.not.installed, dependencies=TRUE)
## Installing package into 'C:/Users/drew.hart/Documents/R/win-library/3.2'
## (as 'lib' is unspecified)
## package 'rgdal' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\drew.hart\AppData\Local\Temp\RtmpKorgfv\downloaded_packages
# Now load all libraries
lapply(required.pkg, library, character.only = TRUE)
## [[1]]
## [1] "rgeos" "dismo" "readxl" "maptools" "raster"
## [6] "sp" "rmarkdown" "stats" "graphics" "grDevices"
## [11] "utils" "datasets" "methods" "base"
##
## [[2]]
## [1] "rgeos" "dismo" "readxl" "maptools" "raster"
## [6] "sp" "rmarkdown" "stats" "graphics" "grDevices"
## [11] "utils" "datasets" "methods" "base"
##
## [[3]]
## [1] "rgeos" "dismo" "readxl" "maptools" "raster"
## [6] "sp" "rmarkdown" "stats" "graphics" "grDevices"
## [11] "utils" "datasets" "methods" "base"
##
## [[4]]
## [1] "rgeos" "dismo" "readxl" "maptools" "raster"
## [6] "sp" "rmarkdown" "stats" "graphics" "grDevices"
## [11] "utils" "datasets" "methods" "base"
##
## [[5]]
## [1] "rgeos" "dismo" "readxl" "maptools" "raster"
## [6] "sp" "rmarkdown" "stats" "graphics" "grDevices"
## [11] "utils" "datasets" "methods" "base"
##
## [[6]]
## [1] "rgeos" "dismo" "readxl" "maptools" "raster"
## [6] "sp" "rmarkdown" "stats" "graphics" "grDevices"
## [11] "utils" "datasets" "methods" "base"
##
## [[7]]
## [1] "rgeos" "dismo" "readxl" "maptools" "raster"
## [6] "sp" "rmarkdown" "stats" "graphics" "grDevices"
## [11] "utils" "datasets" "methods" "base"
Now we’ll download the data for today’s examples.
Use your browser to navigate to https://www.dropbox.com/sh/avo94zu7nijgxw2/AAAkFrVMIeOTwGjnrhwsXjHVa?dl=0 and download and unzip the archive of all of the data located there.
Please be sure to unzip this into the directory of the Github repository that you already downloaded and unzipped. This will serve as your ‘working directory’ (i.e. the main directory you’ll be accessing from R).
These are the modern Peru border and the Mita region border.
First, we’ll need to stipulate the CRS and projection of these data. We can do this by providing the correct arguments in the form of a proj4string, which is a common standard used for defining projections.
#set the correct proj4string for the Mita and Peru data
#(you can get this information from your metadata, or from reading your dataset into ArcGIS or QGIS and inspecting, or from GDAL's command line tools, among other ways)
mita_and_peru_proj4 = CRS("+proj=utm +zone=18 +south +ellps=WGS84 +datum=WGS84 +units=m +no_defs")
And now we can read in the data!
#read in the Mita border (this is a bunch of lines)
# DATA SOURCE: http://scholar.harvard.edu/files/dell/files/ecta8121_0.pdf
mita = readShapeLines('./Dell_raw_data_files/MitaBoundary.shp', proj4string = mita_and_peru_proj4)
## Warning: use rgdal::readOGR or sf::st_read
#and read in the Peru map (this is a group of polygons)
peru = readShapePoly('./Dell_raw_data_files/peru_nw.shp', proj4string = mita_and_peru_proj4)
## Warning: use rgdal::readOGR or sf::st_read
Now, what do these things that we just read in look like? What’s in them?
mita
## class : SpatialLinesDataFrame
## features : 2
## extent : 646480, 945874, 8232697, 8516450 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=utm +zone=18 +south +ellps=WGS84 +datum=WGS84 +units=m +no_defs +towgs84=0,0,0
## variables : 4
## names : OBJECTID, FID_in2_po, Id, Shape_Leng
## min values : 1, 0, 0, 494734.966796
## max values : 2, 0, 0, 698693.867093
This just prints the mita object to your console (i.e. as ‘standard output’, or STDOUT in programmer shorthand jargon)
Now, try typing mita$ and then hitting <Tab>
This should display the column names in mita‘s’SpatialLinesDataFrame’ $ is a common symbol used for getting the columns out of a data.frame (more on the data.frame class in a bit…)
This ability to use
Now try tab-completing mita@.
This returns the components of an S4 object. R has both S3 and S4 objects (and also the less rarely seen R5 objects) Objects are instantiations of classes, which are data-structures used to represent a certain ‘real thing’ (e.g. PDF, animal, shapefile, etc.), and whose hierarchically structured contents are canonically defined by the code living in a script in base R or in a package.
Now we’ll use this knowledge to pull things out of the mita object:
#get the lines component
mita_line = mita@lines
#And what is the type of this component?
typeof(mita@lines)
## [1] "list"
#Because it's a list, we'll use double-brackets to index things out of it:
mita_line_first = mita@lines[[1]]
#NOTE: Not displaying these things in the RMarkdown document because they're LONG! But feel free
#to print them out to your terminal!
#and what is the type of this component?
typeof(mita_line_first)
## [1] "S4"
#now we can tab-complete to explore this, as so on...
So you can see how this is a highly structured data object, used to store all of the bits and pieces we need in order to characterize our geospatial lines
Also, super-helpfully, we can get a really good look at its (or any object’s) structure, almost as a roadmap of how to index into it and access the information we want:
str(mita)
## Formal class 'SpatialLinesDataFrame' [package "sp"] with 4 slots
## ..@ data :'data.frame': 2 obs. of 4 variables:
## .. ..$ OBJECTID : int [1:2] 1 2
## .. ..$ FID_in2_po: int [1:2] 0 0
## .. ..$ Id : int [1:2] 0 0
## .. ..$ Shape_Leng: num [1:2] 494735 698694
## .. ..- attr(*, "data_types")= chr [1:4] "N" "N" "N" "F"
## ..@ lines :List of 2
## .. ..$ :Formal class 'Lines' [package "sp"] with 2 slots
## .. .. .. ..@ Lines:List of 1
## .. .. .. .. ..$ :Formal class 'Line' [package "sp"] with 1 slot
## .. .. .. .. .. .. ..@ coords: num [1:1976, 1:2] 688884 689045 689291 689409 689527 ...
## .. .. .. ..@ ID : chr "0"
## .. ..$ :Formal class 'Lines' [package "sp"] with 2 slots
## .. .. .. ..@ Lines:List of 1
## .. .. .. .. ..$ :Formal class 'Line' [package "sp"] with 1 slot
## .. .. .. .. .. .. ..@ coords: num [1:2420, 1:2] 945642 945635 945665 945853 945874 ...
## .. .. .. ..@ ID : chr "1"
## ..@ bbox : num [1:2, 1:2] 646480 8232697 945874 8516450
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:2] "x" "y"
## .. .. ..$ : chr [1:2] "min" "max"
## ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
## .. .. ..@ projargs: chr "+proj=utm +zone=18 +south +ellps=WGS84 +datum=WGS84 +units=m +no_defs +towgs84=0,0,0"
The method we’ve been using is an excellent way to ‘dig into’ the structure and organization of any object that you’re reading in with any R package, in order to start making sense of how it’s arranged and what lives where. Good to get in the habit.
You can of course also read the docs, which should always stipulate the value (i.e. type, and its characteristics and constraints) of the output of a function. But, often they’re long, complicated, and make your eyes glaze over. So both approaches are important sources of info, but this is a bit more interactive and intuitive.
Now it’s time for you to try! Try to get down to the actual coordinate pairs, and extract the first 100:
[PRACTICE TIME!] .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Here’s the answer:
mita@lines[[1]]@Lines[[1]]@coords[1:100,]
## [,1] [,2]
## [1,] 688884.2 8510730
## [2,] 689044.9 8510704
## [3,] 689290.5 8510711
## [4,] 689409.4 8510563
## [5,] 689526.5 8510324
## [6,] 689779.0 8509922
## [7,] 689917.9 8509801
## [8,] 689983.4 8509774
## [9,] 690204.5 8509564
## [10,] 690439.4 8509468
## [11,] 690489.1 8509493
## [12,] 690832.8 8509493
## [13,] 690862.4 8509534
## [14,] 690919.9 8509555
## [15,] 690949.4 8509596
## [16,] 691472.8 8509965
## [17,] 691798.1 8510329
## [18,] 691877.2 8510367
## [19,] 692033.5 8510498
## [20,] 692241.2 8510761
## [21,] 692263.0 8510870
## [22,] 692257.0 8511044
## [23,] 692158.2 8511321
## [24,] 692117.1 8511634
## [25,] 691878.4 8512023
## [26,] 691843.1 8512367
## [27,] 691932.9 8512507
## [28,] 692325.5 8512875
## [29,] 692403.2 8513040
## [30,] 692505.2 8513180
## [31,] 692644.9 8513558
## [32,] 692710.0 8513674
## [33,] 693092.4 8514024
## [34,] 693257.9 8514232
## [35,] 693273.1 8514320
## [36,] 693257.9 8514555
## [37,] 693299.9 8514752
## [38,] 693250.5 8515279
## [39,] 693209.2 8515404
## [40,] 693203.8 8515781
## [41,] 693282.4 8515846
## [42,] 693398.2 8516025
## [43,] 693656.6 8516314
## [44,] 693777.5 8516407
## [45,] 693913.4 8516450
## [46,] 694208.4 8516436
## [47,] 694626.2 8516228
## [48,] 695050.0 8516123
## [49,] 695619.5 8516122
## [50,] 695729.9 8516055
## [51,] 695830.9 8515927
## [52,] 695828.0 8515776
## [53,] 695649.0 8515494
## [54,] 695455.5 8515317
## [55,] 695430.6 8515238
## [56,] 695368.8 8515156
## [57,] 695368.2 8514933
## [58,] 695428.9 8514817
## [59,] 695594.2 8514727
## [60,] 695887.4 8514615
## [61,] 696036.6 8514447
## [62,] 696124.0 8514305
## [63,] 696128.1 8514010
## [64,] 696022.6 8513735
## [65,] 696011.9 8513531
## [66,] 695946.2 8513394
## [67,] 695959.6 8513274
## [68,] 696027.8 8513213
## [69,] 696330.4 8513064
## [70,] 696464.5 8512957
## [71,] 696617.1 8512766
## [72,] 696780.8 8512461
## [73,] 697038.2 8512221
## [74,] 697644.8 8512110
## [75,] 697837.5 8512103
## [76,] 697995.0 8512139
## [77,] 698112.8 8512293
## [78,] 698389.8 8512562
## [79,] 698494.9 8512699
## [80,] 698567.1 8512843
## [81,] 698781.6 8513075
## [82,] 698862.6 8513068
## [83,] 699034.5 8513125
## [84,] 699229.0 8513081
## [85,] 699665.5 8512842
## [86,] 699987.5 8512750
## [87,] 700333.5 8512558
## [88,] 700470.9 8512520
## [89,] 700606.9 8512411
## [90,] 700761.6 8512235
## [91,] 700873.6 8512053
## [92,] 700967.5 8511785
## [93,] 700966.5 8511302
## [94,] 701004.0 8511020
## [95,] 700991.4 8510685
## [96,] 701129.5 8510514
## [97,] 701318.9 8510443
## [98,] 701435.0 8510301
## [99,] 701485.1 8510053
## [100,] 701550.8 8509977
And as it turns out, there is a method (i.e. a function built into an object of a certain class) that will return the coordinates for an object. For the future, this obviates the complicated syntax that we just typed. Here’s the answer using that instead:
coordinates(mita)[[1]][[1]][1:100,]
## [,1] [,2]
## [1,] 688884.2 8510730
## [2,] 689044.9 8510704
## [3,] 689290.5 8510711
## [4,] 689409.4 8510563
## [5,] 689526.5 8510324
## [6,] 689779.0 8509922
## [7,] 689917.9 8509801
## [8,] 689983.4 8509774
## [9,] 690204.5 8509564
## [10,] 690439.4 8509468
## [11,] 690489.1 8509493
## [12,] 690832.8 8509493
## [13,] 690862.4 8509534
## [14,] 690919.9 8509555
## [15,] 690949.4 8509596
## [16,] 691472.8 8509965
## [17,] 691798.1 8510329
## [18,] 691877.2 8510367
## [19,] 692033.5 8510498
## [20,] 692241.2 8510761
## [21,] 692263.0 8510870
## [22,] 692257.0 8511044
## [23,] 692158.2 8511321
## [24,] 692117.1 8511634
## [25,] 691878.4 8512023
## [26,] 691843.1 8512367
## [27,] 691932.9 8512507
## [28,] 692325.5 8512875
## [29,] 692403.2 8513040
## [30,] 692505.2 8513180
## [31,] 692644.9 8513558
## [32,] 692710.0 8513674
## [33,] 693092.4 8514024
## [34,] 693257.9 8514232
## [35,] 693273.1 8514320
## [36,] 693257.9 8514555
## [37,] 693299.9 8514752
## [38,] 693250.5 8515279
## [39,] 693209.2 8515404
## [40,] 693203.8 8515781
## [41,] 693282.4 8515846
## [42,] 693398.2 8516025
## [43,] 693656.6 8516314
## [44,] 693777.5 8516407
## [45,] 693913.4 8516450
## [46,] 694208.4 8516436
## [47,] 694626.2 8516228
## [48,] 695050.0 8516123
## [49,] 695619.5 8516122
## [50,] 695729.9 8516055
## [51,] 695830.9 8515927
## [52,] 695828.0 8515776
## [53,] 695649.0 8515494
## [54,] 695455.5 8515317
## [55,] 695430.6 8515238
## [56,] 695368.8 8515156
## [57,] 695368.2 8514933
## [58,] 695428.9 8514817
## [59,] 695594.2 8514727
## [60,] 695887.4 8514615
## [61,] 696036.6 8514447
## [62,] 696124.0 8514305
## [63,] 696128.1 8514010
## [64,] 696022.6 8513735
## [65,] 696011.9 8513531
## [66,] 695946.2 8513394
## [67,] 695959.6 8513274
## [68,] 696027.8 8513213
## [69,] 696330.4 8513064
## [70,] 696464.5 8512957
## [71,] 696617.1 8512766
## [72,] 696780.8 8512461
## [73,] 697038.2 8512221
## [74,] 697644.8 8512110
## [75,] 697837.5 8512103
## [76,] 697995.0 8512139
## [77,] 698112.8 8512293
## [78,] 698389.8 8512562
## [79,] 698494.9 8512699
## [80,] 698567.1 8512843
## [81,] 698781.6 8513075
## [82,] 698862.6 8513068
## [83,] 699034.5 8513125
## [84,] 699229.0 8513081
## [85,] 699665.5 8512842
## [86,] 699987.5 8512750
## [87,] 700333.5 8512558
## [88,] 700470.9 8512520
## [89,] 700606.9 8512411
## [90,] 700761.6 8512235
## [91,] 700873.6 8512053
## [92,] 700967.5 8511785
## [93,] 700966.5 8511302
## [94,] 701004.0 8511020
## [95,] 700991.4 8510685
## [96,] 701129.5 8510514
## [97,] 701318.9 8510443
## [98,] 701435.0 8510301
## [99,] 701485.1 8510053
## [100,] 701550.8 8509977
We can plot our first hundred coordinates as points:
plot(mita@lines[[1]]@Lines[[1]]@coords[1:100,]) #as points
Or as lines:
plot(mita@lines[[1]]@Lines[[1]]@coords[1:100,], type = 'l') # as lines
Now, if we were to poke around for a bit in the peru object, we would see that some of the lines’ coordinate arrays have the same coordinate pair in the first and last row. What does this mean?…
plot(peru@polygons[[1]]@Polygons[[819]]@coords, type = 'l')
Makes sense! :) This should help shed light on why points, lines, and polygons are all cases of vector data. Also, you should now be able to easily imagine how you could use this approach to save polygons, polylines, polypolygons (called multipolygons)… and of course points!
It should be clear by now that this amounts to nothing more than plotting them, just as we would plot any other two-dimensional data, but with our x and y dimensions being determined by our CRS.
Here’s a quick-and-dirty map (border in black, Mita region in red)
r plot(peru)
r plot(mita, col = 'red')
Wait! They didn’t plot together. Annoying! Try this:
r plot(peru) lines(mita, col = 'red')
Or alternatively, try this!
plot(peru)
plot(mita, col = 'red', add = TRUE)
Neat, eh!? Straightforward enough, I hope?
Questions?
We’ll start by reading in the XLS file containing the district capitals as a data.frame. We can do this using the readxl package.
caps = read_excel('./Dell_raw_data_files/locations.xlsx')
Remember that we saw a SpatialLinesDataFrame before? And we mentioned the data.frame class? Let’s chat a bit more about the data.frame class:
A data.frame can be thought of in a number of ways:
The programmatic equivalent of spreadsheet (The obvious way, and the intended use)
An array in which different datatypes are allowed in each column (The inutitive way, but not actually how it’s implemented)
A list of data vectors, with each vector (i.e. column) having a name and being constrained to the same length (This is in fact how the data.frame class is implemented; being aware of this can sometimes be helpful.)
class(caps)
## [1] "tbl_df" "tbl" "data.frame"
We can see that, because we used the readxl package to read it in, our file was read in as a derived class defined by that package (tbl.df)
Let’s convert it to just a plan ol’ data.frame (to avoid problems downstream):
caps = as.data.frame(caps)
class(caps)
## [1] "data.frame"
Data frames are the bailiwick of R, and have a ton of methods (remember what this means?) and other functionalities (especially through other very useful packages, e.g. plyr, reshape2) that will make your data work all-powerful. *If you will be working in R, definitely spend some time learning more about and practicing working with data.frames!
We were able to read these points in as a data frame like this, without any fancy packages, because R is currently unaware that these are geospatial data. In other words, our latitude and longitude columns are just two of a number of columns of data that just happen to be numerics.
Clearly, this wouldn’t work (or would be very messy and complicated) with line or polygon data, because in each row, instead of a value for lat and a value for lon, we would need a vector (or a vector of vectors of vectors…) of points!
But it works fine for points, which is why storage in XLS (or CSV, i.e. comma-separated values, more simply) is typical.
However, we can’t just read in the data and run, as this would create a problem:
plot(peru)
points(caps$lon, caps$lat, col = 'blue') #PRO-TIP: Remember that lon is x, lat is y!
Where the heck are our points?
Oh, they’re way the heck out there…
```r #Get min and max lat and lon values x_min = min(min(caps\(lon), min(sapply(peru@polygons[[1]]@Polygons, function(x)min(x@coords[,1]))))-100 y_min = min(min(caps\)lat), min(sapply(peru@polygons[[1]]@Polygons, function(x)min(x@coords[,2]))))-100 x_max = max(max(caps\(lon), max(sapply(peru@polygons[[1]]@Polygons, function(x)max(x@coords[,1]))))+100 y_max = max(max(caps\)lat), max(sapply(peru@polygons[[1]]@Polygons, function(x)max(x@coords[,2]))))+100
#Plot data again, this time using the min and max lat/lon values as our bounding box plot(peru, xlim = c(x_min, x_max), ylim = c(y_min, y_max)) points(caps\(lon, caps\)lat, col = ‘blue’) ```
Why did this happen!?
Because we need assign our points a CRS and projection! And of course, we need to make sure that they match those of our lines and polygons.
One way to do this (there are often various ways of doing a thing when you’re programming) is to use the sp package’s SpatialPointsDataFrame class to create a spatial points object, and then assigning it the same CRS as above:
First, we will again create a proj4string for the caps’ coordinates.
(Note that currently, because we read them in as lat/lon from an Excel spreadsheet, they are simply unprojected, i.e. geographic rather than Cartesian coordinates.)
We will use the most common standard, the World Geodetic System’s most updated ellipsoid from 1984 (hence WGS84):
caps_proj4 = CRS("+proj=longlat + ellps=WGS84")
sp_caps = SpatialPointsDataFrame(cbind(caps$lon, caps$lat), data = caps, proj4string = caps_proj4)
What did this do? Take a look at all of the information output from the following commands:
class(sp_caps)
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"
str(sp_caps)
## Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
## ..@ data :'data.frame': 407 obs. of 9 variables:
## .. ..$ objectid : num [1:407] 1 2 3 4 5 6 7 8 9 10 ...
## .. ..$ capital : chr [1:407] "Andahuaylas" "Andarapa" "Chiara" "Huancarama" ...
## .. ..$ lat : num [1:407] -13.7 -13.5 -13.9 -13.6 -13.8 ...
## .. ..$ lon : num [1:407] -73.4 -73.4 -73.7 -73.1 -73.5 ...
## .. ..$ ubigeo : num [1:407] 30201 30202 30203 30204 30205 ...
## .. ..$ near_fid : num [1:407] 1 1 1 1 1 1 1 1 1 1 ...
## .. ..$ near_dist: num [1:407] 25691 13599 63499 10332 44394 ...
## .. ..$ near_x : num [1:407] 690205 688884 688884 712003 688884 ...
## .. ..$ near_y : num [1:407] 8509564 8510730 8510730 8499692 8510730 ...
## ..@ coords.nrs : num(0)
## ..@ coords : num [1:407, 1:2] -73.4 -73.4 -73.7 -73.1 -73.5 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : NULL
## .. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
## ..@ bbox : num [1:2, 1:2] -74.9 -17.2 -68.5 -12.6
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
## .. .. ..$ : chr [1:2] "min" "max"
## ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
## .. .. ..@ projargs: chr "+proj=longlat +ellps=WGS84"
summary(sp_caps)
## Object of class SpatialPointsDataFrame
## Coordinates:
## min max
## coords.x1 -74.91666 -68.51000
## coords.x2 -17.18333 -12.58333
## Is projected: FALSE
## proj4string : [+proj=longlat +ellps=WGS84]
## Number of points: 407
## Data attributes:
## objectid capital lat lon
## Min. : 1.0 Length:407 Min. :-17.18 Min. :-74.92
## 1st Qu.:102.5 Class :character 1st Qu.:-15.23 1st Qu.:-73.44
## Median :204.0 Mode :character Median :-14.27 Median :-72.27
## Mean :204.0 Mean :-14.50 Mean :-72.21
## 3rd Qu.:305.5 3rd Qu.:-13.75 3rd Qu.:-71.36
## Max. :407.0 Max. :-12.58 Max. :-68.51
## ubigeo near_fid near_dist near_x
## Min. : 30101 Min. :1.000 Min. : 225.2 Min. :646480
## 1st Qu.: 40511 1st Qu.:1.000 1st Qu.: 24663.6 1st Qu.:690205
## Median : 50911 Median :1.000 Median : 53134.1 Median :796373
## Mean : 87340 Mean :1.496 Mean : 64778.6 Mean :796860
## 3rd Qu.: 81207 3rd Qu.:2.000 3rd Qu.: 97638.4 3rd Qu.:886456
## Max. :211307 Max. :2.000 Max. :253008.5 Max. :945874
## near_y
## Min. :8232697
## 1st Qu.:8299944
## Median :8420352
## Mean :8395338
## 3rd Qu.:8502540
## Max. :8516314
Now, let’s project this using the projection used for the Mita lines and Peru polygons:
A projection is a means of representing spatial locations from a non-Euclidean surface (i.e. curved plane) on a Euclidean surface. You can literally think of this as projecting the surface of the globe onto a plane (and indeed, there is a diversity of ways this can be done).
We can do this digitally using established algorithms that we can access through existing packages. The rgdal package, a mainstay of the R spatial world, will do this for us.
proj_caps = spTransform(sp_caps, crs(mita))
Now our map should plot!
plot(peru)
lines(mita, col = 'red')
points(proj_caps, col = 'blue')
We’ll plot this again, adding some nice bells and whistles, and write the output to a PDF.
To do this, we’ll start the graphics device driver for producing PDF plots, which will save our map to the filename we provide. Then we’ll create our map. Lastly, we’ll turn off the graphics device, at which point the map will be written to file.
#Create a normliazed index as a toy plotting parameter
proj_caps$index = 3.0 * (proj_caps$near_dist - min(proj_caps$near_dist))/(max(proj_caps$near_dist) - min(proj_caps$near_dist)) + 0.5
#Define function for plotting colors along a gradient
#(Pulling code from the web can be fun!)
color.gradient <- function(x, colors=c("white", "black"), colsteps=100) {
return( colorRampPalette(colors) (colsteps) [ findInterval(x, seq(min(x),max(x), length.out=colsteps)) ] )
}
pdf('Dell_mapping_output.pdf')
plot(proj_caps@coords, pch = 21, cex = proj_caps$index, bg = color.gradient(proj_caps$index), main = 'Dell 2010 mita data: Example')
lines(peru)
lines(mita, col = 'red', lwd=2)
#once done plotting, turn off the graphics device
dev.off()
## png
## 2
#PLOT AGAIN FOR DISPLAY IN RMARKDOWN DOC
plot(proj_caps@coords, pch = 21, cex = proj_caps$index, bg = color.gradient(proj_caps$index), main = 'Dell 2010 mita data: Example')
lines(peru)
lines(mita, col = 'red', lwd=2)
Lastly, if you’re interested in making maps like this for publication, I definitely recommend getting acquainted with ggplot2. Outside the purview of our time here, unfortunately. You’ll find a ton of good example code online, which can serve as boilerplate for your own mapping exercises.
Now let’s run through a similar exercise, reinforcing some of what we’ve done, while adding some additional components along the way!
First, we’ll need to create the proper proj4string for our Africa datasets. It makes our lives easier that they all are already projected identically, so one string to rule them all…
africa_proj4 = CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")
Now go ahead and try loading the data for this section (explorer routes, African countries, and tribal regions).
[PRACTICE TIME!] .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Here’s how we can do this:
#load in explorer routes
#SOURCE: http://scholar.harvard.edu/files/nunn/files/nunn_wantchekon_aer_2011.pdf
routes = readShapeLines('./Pre_Colonial_Africa_Explorer_Routes/Explorer_Routes_Final.shp', proj4string = africa_proj4)
## Warning: use rgdal::readOGR or sf::st_read
#load in the Africa data
africa = readShapePoly('./Pre_Colonial_Africa_Explorer_Routes/African_base.shp', proj4string = africa_proj4)
## Warning: use rgdal::readOGR or sf::st_read
#and lastly, load the tribal groups data
#SOURCE: http://scholar.harvard.edu/files/nunn/files/empirical_slavery.pdf
tribes = readShapePoly('./Murdock_shapefile/borders_tribes.shp', proj4string = africa_proj4)
## Warning: use rgdal::readOGR or sf::st_read
Now, we should have a good idea of the structure and content of these objects, based on the output of the following:
routes
## class : SpatialLinesDataFrame
## features : 21
## extent : -15.17746, 39.75311, -33.81254, 33.86637 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0
## variables : 4
## names : Explorer_N, Year_Began, Year_End, Notes
## min values : Baker, 1768, 1773, Wissmann and Pogge to Nyangwe, Wissmann alone to Zanzibar
## max values : Wissmann, 1884, 1885, Wissmann and Pogge to Nyangwe, Wissmann alone to Zanzibar
africa
## class : SpatialPolygonsDataFrame
## features : 1
## extent : -25.36056, 51.41264, -34.82223, 37.53944 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0
## variables : 5
## names : OBJECTID, CONTINENT, Cnt_CONTIN, Shape_Leng, Shape_Area
## min values : 1, Africa, 475, 541.5582, 2558.702
## max values : 1, Africa, 475, 541.5582, 2558.702
Questions?
data.frame for only those corresponding to voyages in or before 1875Based on what we know about the structures, and on our knowledge of some basic R subsetting operations, this should also be straightforward.
First, how to find out what column of our SpatialLinesDataFrame will we operate on? Try tab-completion…
[PRACTICE TIME!] .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
What do you think?
Looks like it makes the most sense to use the Year_End column, so that we will include only voyages that were completed by the end of calendar year 1875.
Let’s do that:
use_routes = routes[routes$Year_End <= 1875, ]
Note that because this is a spatial data.frame, which is a derived class of a data.frame, we can use all the same basic notation we would use for subsetting a data.frame, yet we can still plot it with proper geospatial referencing after doing that. In other words, the Spatial*DataFrame classes are written to provide intuitive data.frame functionalities for geospatial data objects.
In other words (in a geometric/programmatic sense), we would like to know which tribal polygons intersect with the pre-1875 routes.
Geospatial packages provide handy functions to do exactly these sorts of spatial analysis tasks.
Here’s one!
intersect_tribes = intersect(tribes, use_routes)
Annoying note: Seemingly contradictorily, the intersect function belongs to the raster package. However, it seems to serve our purposes because it is able to assess the intersection between two vector (i.e. non-raster) datasets, and return the result still in a Spatial*DataFrame, which is precisely what we want. The other obvious option would be gIntersection function, in the rgeos package, but its particular functionality would actually be a less convenient. To highlight that there are many ways to do a thing! In navigating among them, Google will be your best friend. You’ll want to search for your needs using the proper jargon, since it serves a universal standard for discussing this stuff. You’ll likely find nicely framed questions and multiple (vote-ranked) answers for what you need, mainly on StackExchange (e.g. http://gis.stackexchange.com/questions/157194/difference-between-intersect-raster-and-gintersection-rgeos-in-polygons-r).
How can we tell how many of the tribal-group areas intersected?…
length(intersect_tribes)
## [1] 180
It appears that 180 of the 843 tribal group areas intersected.
Let’s try to plot the contacted (i.e. intersected) tribal areas in purple, and leave the uncontacted as white.
(Note: A map of this sort is known as a choropleth map. This is a very common way to visualize data by polygon. The data could be either discrete (i.e. take on either categorical or stepwise distinct values, represented by distinct colors, like our contacted/uncontacted binary variable) or continuous (which would then be mapped onto a colorbar by whatever scheme were deemed appropriate).
Go ahead, take a few minutes and give it a go!
[PRACTICE TIME!] .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Here’s a solution:
plot(tribes, col = 'white')
plot(intersect_tribes, col = 'purple', add = T)
And we can add our routes as well, to make sure that the intersect function acutally did what we expected it to:
plot(tribes, col = 'white')
plot(intersect_tribes, col = 'purple', add = T)
lines(routes, col = 'yellow', lwd = 2)
lines(use_routes, col = 'green', lty = 9, lwd = 3)
We don’t have a lot of time to work with raster data here, so we’ll just briefly read some in and plot it, to demonstrate what it looks like and give you a feel for how it differs from vector data.
A quick way to get some very basic global datasets is with the getData function in the raster package. We can retrieve climatic data (min temp, max temp, annual precip), elevation, admin boundaries, and some other stuff.
Let’s get precip, at 10’ resolution from WorldClim:
ppt = getData('worldclim', var = 'prec', res = 10)
And now we plot it and we’re done! So easy!
plot(ppt)
Psych!
What is this?!
It appears this gives us average monthly precip. But we want average annual.
So, let’s just sum them!
But to figure out how to do this, let’s first introspect our structure:
r ppt
## class : RasterStack ## dimensions : 900, 2160, 1944000, 12 (nrow, ncol, ncell, nlayers) ## resolution : 0.1666667, 0.1666667 (x, y) ## extent : -180, 180, -60, 90 (xmin, xmax, ymin, ymax) ## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 ## names : prec1, prec2, prec3, prec4, prec5, prec6, prec7, prec8, prec9, prec10, prec11, prec12 ## min values : 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ## max values : 885, 736, 719, 820, 955, 1850, 2088, 1386, 904, 980, 893, 914
We seem to have summary information presented to us very similarly to how it was for our vector datasets. This is the result of the years of standardizing development that have been put into R’s core spatial packages. (Thank you, stranger-people.)
However, notice that this object is an instantiation of a particular class designed for raster data;
typeof(ppt)
## [1] "S4"
class(ppt)
## [1] "RasterStack"
## attr(,"package")
## [1] "raster"
Luckily, the RasterStack class is pretty straightforward, and can be thought of exactly as it sounds. In other words, if each raster is a single layer of gridded, valued cells, then this is just a stack of those. 12 layers in total!
So how can we sum the monthly values to get a single layer of average annual precip?
yr_ppt = sum(ppt)
NO PSYCH! Truly. That’s it.
(Note that what we just did is perhaps the simplest form of raster algebra.)
Now, because we have global data, we’ll want to hone in on Africa. In other words, we’ll want to feed in a new bounding box. So why don’t we just extract this from our largest vector layer?
box= bbox(africa)
And now we can use this box as the extent for our plot!
plot(yr_ppt, ext = box)
(Also, here’s a cool alternative for getting a bounding box, though it will only work if you run it interactively (i.e. from the command line):
`plot(yr_ppt)
draw_box = drawExtent()
plot(yr_ppt, ext = draw_box)`
Neat, eh?)
Of course, we’ll again need to reproject our raster to match our other data’s projection:
#NOTE: I CAN'T CURRENTLY RUN THIS, BECAUSE OF MY ISSUES WITH RGDAL,
#BUT FOR SOME REASON IT DOESN'T SEEM TO MATTER FOR MAPPING PURPOSES.
#NONETHELESS, I'M LEAVING THIS, AND IT SHOULD WORK ONCE I'VE DEBUGGED MYRGDAL ISSUE
#reproj_yr_ppt = projectRaster(yr_ppt, africa_proj4)
And we’re ready to plot everything together!
I leave that as your final practice for the day! Take the remaining time to look over everything we’ve done, Google whatever you like, and try to figure out how to do the following: - Plot the average annual precip raster as your basemap - Add the border of Africa - Add the borders of the historical tribal group areas - Color the polygons of those areas that intersect the pre-1875 routes in transparent grey - Add the routes of the pre-1875 routes, using dotted red lines - Save the map in your current working directory, using whatever filename you like
[PRACTICE TIME!] .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
pdf('africa_map_practice.pdf')
plot(yr_ppt, ext = box, main = "African tribal area and explorer route data: Example")
plot(africa, add = T)
plot(tribes, add = T)
plot(intersect_tribes, col = '#45454555', add = T)
#PRO-TIP: this color is expressed as a hexadecimal string (string of digits between 0 and F), with the last
#digit-couple indicating transparency (values less than FF are increasingly transparent)
lines(use_routes, col = 'red', lwd = 2) #lwd just makes the lines thicker, so easier to see
dev.off()
## png
## 2
#PLOT AGAIN FOR DISPLAY IN RMARKDOWN DOC
plot(yr_ppt, ext = box, main = "African tribal area and explorer route data: Example")
plot(africa, add = T)
plot(tribes, add = T)
plot(intersect_tribes, col = '#45454555', add = T)
#PRO-TIP: this color is expressed as a hexadecimal string (string of digits between 0 and F), with the last
#digit-couple indicating transparency (values less than FF are increasingly transparent)
lines(use_routes, col = 'red', lwd = 2) #lwd just makes the lines thicker, so easier to see